A Bag Reconstruction Method for Multiple Instance Classification and Group Record Linkage
نویسندگان
چکیده
Record linking is the task of detecting records in several databases that refer to the same entity. This task aims at exploring the relationship between entities, which normally lack common identifiers in heterogeneous datasets. When entities contain multiple relational records, linking them across datasets can be more accurate by treating the records as groups, which leads to group linking methods. Even so, individual record links may still be needed for the final group linking step. This problem can be solved by multiple instance learning, in which group links are modelled as bags, and record links are considered as instances. In this paper, we propose a novel method for instance classification and group record linkage via bag reconstruction from instances. The bag reconstruction is based on the modeling of the distribution of negative instances in the training bags via kernel density estimation. We evaluate this approach on both synthetic and real-world data. Our results show that the proposed method can outperform several baseline methods.
منابع مشابه
Multiple Instance Learning for Group Record Linkage
Record linkage is the process of identifying records that refer to the same entities from different data sources. While most research efforts are concerned with linking individual records, new approaches have recently been proposed to link groups of records across databases. Group record linkage aims to determine if two groups of records in two databases refer to the same entity or not. One app...
متن کاملبازیابی تعاملی تصاویر طبیعت با بهره گیری از یادگیری چند نمونه ای
Content-based image retrieval (CBIR) has received considerable research interest in the recent years. The basic problem in CBIR is the semantic gap between the high-level image semantics and the low-level image features. Region-based image retrieval and learning from user interaction through relevance feedback are two main approaches to solving this problem. Recently, the research in integra...
متن کاملGroup based Self Training for E-Commerce Product Record Linkage
In this paper, we study the task of product record linkage across multiple e-commerce websites. We solve this task via a semi-supervised approach and adopt the self-training algorithm for learning with little labeled data. In previous self-training algorithms, the learner tries to convert the most confidently predicted unlabeled examples of each class into labeled training examples. However, th...
متن کاملMultiple instance ensemble learning method for high-resolution remote sensing image classification
Multiple Instance Learning Via Embedded Instance Selection (MILES) has shown good performance in dealing with noisy training samples, but its bag prediction rule may introduce new uncertainty into the remote sensing image classification results. In order to overcome this limitation, two popular ensemble learning strategies, Bagging and AdaBoost are integrated with MILES. Two methods are propose...
متن کاملInstance Label Prediction by Dirichlet Process Multiple Instance Learning
We propose a generative Bayesian model that predicts instance labels from weak (bag-level) supervision. We solve this problem by simultaneously modeling class distributions by Gaussian mixture models and inferring the class labels of positive bag instances that satisfy the multiple instance constraints. We employ Dirichlet process priors on mixture weights to automate model selection, and effic...
متن کامل